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AN ARRANGEMENT AND A METHOD FOR HANDLING AN AUDIO SIGNAL 



TECHNICAL FIELD OF THE INVENTION 

5 The present invention relates to an arrangement and a method 
for handling an asynchronous, digital audio signal on a 
network in connection with a personal computer. 



DESCRIPTION OF RELATED ART 

10 A personal coirputer PC, that is equipped with different types 
of sound devices such as sound cards, can be used as a 
telephone. The PC has a network interface connected to' a 
telephony application, which in turn is connected to a sound 
interface. The latter writes standardized sound messages and 

15 is connected to a first type of sound card via a first 
driver. Alternatively the sound interface is connected to a 
universal serial bus USB via second driver and the USB is 
connected to a second type of sound card. 

A local area network LAN, on which data packets are 
20 transmitted asynchronously, is connected to the PC's network 
interface. If the data packets are sound packets the network 
interface selects the telephony application, which receives 
the sound packets. These are received in buffers in the 
telephony application. 

25 When the first type of sound card is utilized the telephony 
application informs the sound interface which codec is to be 
used. The sound interface sets up an interface to the sound 
card and the first driver converts the sound signal before it 
arrives to the sound card. This card is an A/D-D/A converter, 

30 converting the signal into a sound signal for a loudspeaker. 

When the second type of sound card is used the sound 
interface sends sound packets to the second driver, which 
produces an isochronous data flow over the USB. The 
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isochronous rate is determined by free capacity on the USB. 
The second sound card transforms the data into a sound signal 
for a loudspeaker. 

These two known methods heavily load down the PC* The 
5 transmitted speech is delayed 200-300 ms in the PC, which can 
cause deterioration in speech quality. Also, during an 
ongoing call, the sound cards in the PC can't handle other 
types of soundr e.g. a game with acoustic illustrations. When 
running other non-audio applications on the PC the audio 
10 processing is disturbed, which can result in a degradation of 
the audio to an lanacceptable level. 

As an alternative to a sound card connected to a PC there 
exists a harware board, that emulates a complete subscriber 
line interface circuit, to which an ordinary telephone is 
15 coupled. The hardware card makes no use of an existing PC. 

In the U.S. patent No. 5,761,537 is disclosed a personal 
computer system with a stereo audio circuit. A left and a 
right stereo audio channel are routed through the audio 
circuit to loudspeakers. A surround sound channel is routed 
20 through a universal serial bus to an additional loudspeaker. 
A problem solved is synchronization between the stereo 
channels and the surround sound channel. The arrangement is 
intended for music. 

The Japanese abstracts with publication number JP10247139, 
25 JP11088839 and JP59140783 all disclose different methods to 
reduce processor workload in computers when processing sound 
data. 



SUMMARY OF THE INVENTION 

30 A main problem in transfering an asynchronous digital audio 
signal for telephony via a PC equipped with a sound device 
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such as a sound card is the abovementioned delay and 
deterioration of the audio signal. 

A further problem is that the transfering of the audio signal 
for telephony involves a heavy workload for the PC. This 
5 results in that the PC can't simultaneously transfer the 
audio signal and handle other audio messages. 

Still a problem is a deterioration of speech quality when 
running non-audio applications parallelly with the soiand 
card. 

10 The above mentioned problems are solved by a sound device 
connected to the PC. The sound device handles both incoming 
and outgoing speech. The digital audio signal is trans fered 
asynchronously through the PC between a network^ to which the 
PC is connected, and the sound device. The main signal 

15 processing of the digital audio signal is performed in the 
sound device, which can be designed to handle speech in full 
duplex. 

Some more in detail the problem is solved by the signal 
processing in the sound device includes A/D-D/A converting, 
20 coding/ decoding in a codec and, when receiving speech on the 
network, also buffering of the audio signal in a frame 
buffer. The codec and the A/D-D/A converter are harware 
devices . 

A purpose with the present invention is to shorten the delay 
25 in the PC of the audio signal trans fered. 

Another purpose is to ameliorate the quality of the audio 
signal trans fered by the PC. 

Still a purpose is to make it possible to simultaneously 
handle both the audio signal and other audio messages in. the 
30 PC. 
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A further purpose is to make it possible to simultaneously 
handle both the audio signal and non-audio applications in 
the PC without deterioration of the speech. 

An advantage with the invention is less delay of the audio 
5 signal in the PC. 

Another advantage is a higher quality of the audio signal 
trans fered by the PC also when running other non- audio 
applications . 

Still an advantage is that the audio signal can be transfered 
10 by the PC simultaneously with the processing of other audio 
messages. 

A further advantage is that using a PC in connection with the 
sound device is cheaper than using a complete SLIC to which a 
telephone is connected. 

15 The invention will now be more closely described with the aid 
of prefered embodiments and with reference to the following 
drawings • 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 shows a block scheme over a PC with a somd device; 

20 Figure 2 shows a block scheme over a protocol stack; 

Figure 3 shows a time diagram over a data packet; 

Figure 4 shows a block scheme over the sound device; 

Figures 5a and 5b show a flow chart over an inventive method; 
and 

25 Figure 6 shows a flow chart over an inventive method. 
DETAILED DESCRIPTION OF EMBODIMENTS 
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Figure 1 shows a personal computer (PC), referenced Pl# which 
is connected to an inventive sound device SDl and to a local 
area network LANl. The PC PI is also connected to traditional 
sound cards SCI and SC2. The PC PI receives soxind packets 5 
5 from the network LANl and these packets are processed by the 
PC and by alternatively the sound card SCI or SC2 or by the 
sound device SDl, as will be described more closely below. 
Also, speech as an acoustic signal can be received by the 
sound card or the sound device and be converted into signals, 
10 which are processed before transmission on the network LANl* 

First the sound packet 5 will be commented in connection with 
figure 2, The sound packet is set up by a protocol RTF (Real 
Time Protocol), which is built up of a protocol stack 20 with 
a number of layers- In a transport layer 21 a physical 

15 address for a sending device, such as a router, is given. The 
address is changed for every new sending device in the 
network, that the sound packet passes. In an IP layer 22 a 
source and a destination is given and in a UDP layer 23 
sending and receiving application address is given. A next 

20 layer 24 is a RTP/RTCP layer in which a control protocol is 
generated, which describes how a receiving device apprehends 
the sent media stream. The layer also includes a time stamp 
25, which indicates a moment when a certain sound packet was 
created. A payload type layer 26 describes how user data is 

25 coded, i.e. which codec that has been used for the coding. 
The user data, that is coded as a number of vector parameters 
for music, speech etc., is to be found as codec frames in a 
user data layer 27. 

Returning to figure 1, the abovementioned traditional sound 
30 cards SCI and SC2 and the processing of the sound packets 5 
in connection therewith will be commented. The PC PI has a 
network interface 3 connected to the network LANl and to a 
telephony application 1. Also other applications are 
connected to the interface 3, exemplified by an application 
35 2. The telephony application 1 has frame buffers Bl for 
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buffering the sound packets 5 and is connected to a sound 
application programming interface (sound API) 6, The latter 
is in turn connected to the sound card SCI via a first driver 
Dl and also to the sound card SC2 via a second driver D2 and 
5 a universal serial bus USB 4. The sound cards SCI and SC2 are 
both software applications. The sound API 6 has different 
codecs in form of software applications and writes 
standardized sound messages for the. sound cards SDl and SD2. 
The signal processing includes that digital data packets are 

10 transfered asynchronously on the network LANl. In a case when 
these data packets are the sound packets 5 for telephony, the 
interface 3 selects the telephony application 1, to which it 
sends the sound packets 5. According to traditional 
technology the sound packets are received in the frame 

15 buffers Bl in the telephony application 1. The sound packets 
are queued in the buffers, which then assorts the packets 
based on the time stamps 25. This sorting includes e.g. that 
packets having arrived too late are deleted. When the sound 
card SCI is utilized the telephony application 1 informs the 

20 sound API of which of the codec is to be utilized. The sound 
packets are transmitted in consecutive order from the buffer 
81 in the telephony application 1 to the sound API 6. The 
latter decodes the sound packets into linear PCM format in 
the utilized codec and sets up an interface to the sound card 

25 SCI. The driver Dl then converts the signal to a form 
suitable for the sound card SCI. This card is a A/D-D/A 
converter, which transforms the signal from its PCM format 
into a sound signal intended for a. loudspeaker 7. Sound 
received by a micophone 8 is processed in the reverse order, 

30 but is not buffered in the buffer Bl before it is transmitted 
on the network LANl. When the sound card SC2 is used, the 
sound API 6 transmits sound packets to the driver D2, which 
creates an isochronous data flow over the bus 4. The PCM 
coded sound is transmitted over the bus at a rate which 

35 depends on free capacity on the bus. Also the sound card SC2 
is an A/D-D/A converter that transforms the signal into a 
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sound signal intended for the loudspeaker 7. As the 
transmission over the bus is isochronous the sound card SC2 
has a small buffer for the PCM coded signal to get the 
correct signal rate before the D/A conversion. 

5 

Use of the traditional sound cards SCI and SC2 causes a heavy 
workload on the PC and the incoming sound packets are delayed 
in the PC rather much, 200-300 ms. Also, the sound cards have 
a heavy workload and can't process other sound messages 
10 during an ongoing telephone call. The sound cards SCI and SC2 
are mainly used for simplex , transmission, i.e. for either 
recording or playing back, and have a linear frequency 
response designed for music. The cards can be utilzed for 
speech but are not optimized for it. 

15 It was mentioned above that the data flow on the serial bus 4 
was isochronous. This transmission will be shortly commented 
in connection with figure 3, in which T denotes time. Data 31 
is transmitted in packets 32 having a duration of Tl 
microseconds. The packets 32 are transmitted at a certain 

20 pace that is constant, but can be different at different 
occations, depending on the present traffic situation on the 
bus. This means that the duration Tl of the packets can be 
different at different occations, but lies within certain 
time constraints. One such constraint is based on the fact 

25 that must be delivered as fast as it is displayed. If Tl= 125 
microseconds the data flow is not only isochronous but also 
synchronous with a controlling clock, i.e. the data is 
transmitted over the bus 4 at specific intervals with the 
same pace as it was once produced. ^ 

30 The inventive sound device SDl is briefly shown in figure 1. 
It comprises a frame buffer B2 which is connected to a codec 
device C2. The latter is connected to a D/A and A/D converter 
AD2, which is connected to in/ out devices including a 
loudspeaker 10, a microphone 11 and a headset 12, A ring 
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signal device 13 is connected to the sound device. The frame 
buffer B2 is connected to the telephony application 1 in the 
PC PI via a line 9 and a driver D3. 

When the sound device SDl is used, the asynchronous sound 
5 packets 5 on the network IiANl are transfered asynchronously 
and unbuffered by the PC PI, in contrary to the transfer in 
the abovementioned traditional technology. This means that 
the sound packets 5 are transfered asynchronously from the 
network LANl via the network interface 3 to the telephony 

10 application 1. When arriving to the application 1, the sound 
packets are not buffered in the frame buffer Bl but are 
transmitted to the driver D3. The driver transmits the sound 
packets, still asynchronously, via the line 9 to the sound 
device SDl. The driver is responsive for the connection 9, 

15 which connection includes a connection for transmission of 
the sound packets and a connection for control signals to the 
sound device SDl, as will be described more closely below. In 
the sound device SDl the soxind packets are buffered in the 
buffer B2, decoded in the codec device C2 and D/A converted 

20 in the converter AD2 as will be more closely described below. 
The loudspeaker 10 and the microphone 11 are parts in a 
telephone handset and the headset 12 is an integrated part of 
the sound device. 



25 The sound device SDl is shown in some more detail in figure 
4. The frame buffer B2, which is a software buffer, is 
connected to the PC PI by the line 9. The latter comprises a 
connection 9a for the sound packets 5 and a control 
connection 9b. The frame buffer is connected to the codec 

30 device C2 and transmits sound frames SFl to it. The codec 
device C2 has a number of codecs C21, C22 and C23 for 
decoding the sound frames, which can be coded according to 
different coding algorithms- The codec device also has a 
somewhat simplified auxiliary codec OA which follows the 



wo 03/075150 



9 



PCT/SE02/00379 



speech stream, the function of which will be explained below. 
The codec device C2 is a hardware signal processor that is 
loaded with the codecs and also has other units 15, An 
exampel on such a unit is an acoustic echo canceller, which 
5 registers sound from the microphone 11 that is an echo from 
speech generated in the loudspeaker 10, and cancels the echo 
in the following frames. The codec device C2 is connected to 
the A/D - D/A converter AD2, which is connected to the in/out 
devices 10, 11 and 12 • The converter AD2 operates in a 

10 conventional manner, but is a full duplex converter for 
simultaneously D/A conversion and A/D conversion. It has a 
tone curve that is unlinear and is adapted for the devices 
10, 11 and 12. The properties of these devices are known and 
the analogue tone curve and signal anplification therefore 

15 can be adapted to guarantee the sound volume and quality in 
accordance with telephony specifications. The tone curve is 
mainly adapted digitally and only a lower order filter for 
noise and hum suppression is used in the analogue part. The 
control connection 9b is connected to the frame buffer B2, to 

20 the codec device and to the A/D - D/A converter and also to 
the ring signal device 13. 



When the sound device SDl is utilized the sound packets are 
processed in the following manner. Nomnally the data packets 

25 on the network LANl are delayed during the transmission and 
when arriving to the PC PI they are already delayed by the 
network from 10 ms up to 200 ms. As described earlier, when 
the interface 3 senses that the packets are the sound packets 
5 for telephony, it sends the packets to the telephony 

30 application 1. When the sound device SDl is selected to 
handle telephony, the telephony application 1 does not buffer 
the sound packets but sends them to the driver 3. The driver 
sends the sound packets to the bus 4, which transmits the 
packets isochronously to the sound device SDl over the 

35 connection 9a as a signal denoted SPl. This handling in the 
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PC involves a delay of the sound packets which can vary, but 
which in most cases is less than the delay on the network. 

The sound packets 5 arriving to the sound device SDl are 
5 buffered in the frame buffer B2, which then sends the sound 
frames SFl to the appropriate one of the codecs 021, C22 or 
C23. The selection of codec will be described later. The 
sound in the sound frames is coded in form of parameters for 
speech vectors, which coding can be performed in a number of 
10 different ways. The frame buffer sends the sound frames to 
the one of the codecs that corresponds to the present coding 
algorithm, and it also sends the frames to the auxiliary 
codec CA. 

Having the frame buffer B2 close to the codec device C2 opens 

15 a number of possibilities to influence the processing of the 
sound packets. One such possibility concerns the varying time 
delay in the PC PI. These variations are handled by the frame 
buffer B2, which sends the sound frames SFl at a uniform pace 
to the codec device. Another possibility appears when the 

20 buffer reads the time stamps 25 in the sound packets and 
notes lost packets. These packets are restored in the 
following manner. The auxiliary codec CA receives as 
mentioned the sound frames and follows the speech stream. The 
information collected in that way is used to predict the 

25 speech stream and a sound frame in a lost packet can be 
replaced by a predicted sound frame. Thereby unnecessary 
noise in the speech is avoided. It can happen that a 
transmitter sends the sound packets 5 a little bit too slow. 
The frame buffer, transmitting the sound frames at normal 

30 pace to the codec device C2, therefore can get empty. The 
auxiliary codec CA then produces noise frames to fill up the 
speech and avoid a sudden interruption, which appears as a 
clic sound in the speech. The frame buffer also can get 
overfilled and the selected codec is then forced to work a 

35 little bit faster by adjusting its clock. This results in 
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that the speech will run a little bit faster and the pitch of 
the voice will rise a little. 

The codec device C2 decodes the received sound frames/ \ 
according to the present embodiment, into PCM samples which 
5 are sent to the A/D-D/A converter AD2. The latter D/A 
converts the PCM samples into an analog speech signal SSI in 
a conventional manner. It then sends this speech signal to 
the micrphone 10 or the headset 12, depending on which one of 
them that is selected by an operator. 

10 When sound is received in the microphone 11, an analog sound 
signal is generated and is A/D converted in the converter AD2 
into PCM samples. In the sound device SDl this A/D conversion 
is independent of the D/A conversion of the sound packets 5 
received from the network LANl. The sound device SDl thus 

15 have the advantage of processing a telephone call in full 
duplex. The PCM samples are coded in one of the codecs C21, 
C22 and C23 into parameters for speech vectors and are sent 
directly to the PC PI without any buffering in the frame 
buffer B2. The PC transmits corresponding sound packets to 

20 the network LANl without any buffering in the frame buffer Bl 
in the telephony application 1. 

The above described function of the sound device SDl is 
controled by control data CTLl on the control connection 9b, 
which data can be used to configure the sound device. The 
25 control data is transmitted asynchronously by a protocol 
different from the protocol 20 for the speech. The control 
data is transmitted to the frame buffer 82, the codec device 
C2, the A/D-^D/A converter AD2 and to the ring generator 13. 

When a call comes to the PC PI via the network LANl, the 
30 first thing that arrives is a request for a ring signal. This 
request is transmitted from the telephony application 1 as 
control data to the ring signal device 13, which alerts a 
siabscriber SUBl. The subscriber takes the call, e.g. by 
pressing a response button. A corresponding control signal 
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CTL2, "hook off- signal", is sent to the telephony 
application, which signals that the call will be received. 
When the call itself comes to the PC, the telephony 
application 1 configures the sound device by the control data 
5 CTLl in dependence of the content in the data packets 5. This 
configuration includes an order which determines the size of 
the buffers in the frame buffer B2 and also includes an order 
which one of the codecs C21, C22 or C23 that is to be used 
for the call. 

As appears from the above description the sound device SDl 
has advantages in addition to already mentioned advantages- 
The codec device C2 can be controled by the frame buffer B2 
for lost sound frames, when the transmission is slow and 
frame buffer runs empty or when the transmission is too fast 
and the frame buffer is overfilled. This control is possible 
only because the frame buffer B2 and the codec device D2 are 
close to each other in the sound device SDl. 



10 



15 



The process when taking a telephone call with the aid of the 
20 PC PI equipped with the sound device SDl will be summarized 
in connection with figures 5a and 5b. The PC receives from 
the network LANl a request RTl for a ring tone according to a 
step 31. In a step 32 the ring tone request is transmitted to 
the ring signal device 13 which generates a ring signal- The 
25 subscriber SUBl takes the call in a step 33, and the hook 
off-signal CTL2 is generated and is sent back on- the network. 
In a step 34 the sound packets 5 are transmitted to the 
network interface 3 of the PC PI. The telephony application 1 
receives the sound packets in a step 35 and selects the width 
30 of the buffers in the frame buffer B2 in a step 36. In a next 
step 37 the telephony application selects the appropriate one 
of the codecs C21, C22 or C23. The codec selection and the 
buffer width selection is performed by the control signal 
CTLl. The sound packets are transmitted asynchronously to the 
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fraitie buffer B2 in the sound device SDl according to a step 
38. The process continues at A in figure 5b. In a step 39 it 
is investigated by the frame buffer whether any sound packet 
is lost.. In an alternative YES a sound frame is generated by 
5 the auxiliary codec CA according to a step 40. After this 
step, or if according to an alternative NO there is no lost 
sound packet/ it is investigated according to a step 41 
whether the frame buffer B2 is empty. In an alternative YES 
the auxiliary codec CA generates a noise sound frame, step 

10 42. After this step, or if according to an alternative NO 
there are still frames in the frame buffer, it is 
investigated whether there is any risk that the frame buffer 
B2 will get overfilled, step 43. In an alternative YES the 
selected codec is speeded up by adjusting its clock according 

15 to a step 44. After step 44, or if according to an 
alternative NO there is still space in the frame buffer, the 
sound frames are decoded by the selected codec according to a 
step 45. In a step 46 the decoded frames are D/A converted in 
the converter AD2 into the signal SSI and in a step 47 sound 

20 is generated in the loudspeaker 10. 

In connection with figure 6 the process when making a 
telephone call with the aid of the PC PI equipped with the 
sound device SDl will be sunomarized. In a step 61 the call is 
initiated, including that the subscriber SUBl dials a number 

25 to a called subscriber. The information in connection with 
that is transmitted by a control signal CTL2. When the call 
is going on, sound is received by the microphone 11, step 62. 
In a step 63 an analog sound signal SS2 is generated and in a 
step 64 the signal SS2 is A/D converted into PCM samples. In 

30 a step 65 one of the codecs C21, C22 or C23 is selected and 
in a step 66 the selected codec codes the PCM samples into 
frames with speech vectors. Sound packets are generated 
according to a step 67. In a step 68 the sound packets are 
transmitted via the connection 9 to the PC and through the PC 
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to the network interface 3. The sound packets are transmitted 
to the network LANl in a step 69. 



